Sequential Modeling for Identifying Gene Locations in Human Genome
نویسندگان
چکیده
We consider several sequential processing algorithms for identifying genes in human DNA, based on detecting CpG islands. The algorithms are designed to capture the underlying statistical structure in a DNA sequence. Sequential processing using a Markov model and a hidden Markov model are shown to identify most CpG islands in annotated (marked) DNA subsequences in publicly available DNA data sets. We also consider a wavelet-based hidden Markov tree (HMT). In the context of the HMT, we address design of adaptive wavelets matched to CpG islands, this effected via lifting and geneticalgorithm optimization. DNA is comprised of a sequence of units called nucleotides [1]: adenine (A), cytosine (C), guanine (G), and thyamine (T). In the human genome, a C nucleotide is generally modified chemically by methylation if followed by a G. Methyl-C mutates into a T with a high probability. The methylation process is suppressed in localized segments of the genome, often at the “start” regions of many genes. These regions, characterized by a higher concentration of C-G dinucleotides than elsewhere, are called CpG islands (“C precedes G”). Our objective is to produce models for distinguishing variable-length CpG islands from the rest of the DNA sequence. We consider the following algorithms: a
منابع مشابه
I-49: Human Y Chromosome ProteomeProject
The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...
متن کاملRun of Homozygosity a Procedure to Detecting Inbreeding in Farm Animals
Inbreeding depression is a harmful phenomenon in livestock which is outcome of inbreeding. Inbreeding is consequence mating between two individuals who are more related to each other than average relatedness in population, which results in reducing in fitness of progenies and genetic variability in populations. Development of high-density genome-wide single nucleotide polymorphism (SNP) array f...
متن کاملComputational prediction of miRNAs in Nipah virus genome reveals possible interaction with human genes involved in encephalitis
Current re-emergence of Nipah virus (NiV) in India caused 11 deaths so far and many patients were kept in quarantine. A thorough study of previous outbreaks occurred in Malaysia, Bangladesh and India represents cases with high rate of fatality due to acute encephalitis. Our work involves genome analysis of NiV for prediction of miRNAs and their targeted genes in human in order to understand enc...
متن کاملMolecular detection of proteolytic activity of human parechovirus 2A protein by gene expression
Parechoviruses form one of the nine genera in the picornaviridae family, and include two human pathogens: Human parechovirus type1 and 2 (Hpev1 and Hpev2). The genome of picornaviruses encodes a single polyprotein, which undergoes a cleavage cascade performed by virus encoded proteases to give the final virus proteins. The primary cleavage occurs by 2A protein and this step is critical for vi...
متن کاملGenome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کامل